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Abstract 

Recently, several types of Japanese-to-English 
machine translation systems have been devel- 
oped, but all of them require an initial process 
of rewriting the original text into easily translat- 
able Japanese. Therefore these systems are un- 
suitable for translating information that needs 
to be speedily disseminated. To overcome this 
limitation, a Multi-Level Translation Method 
based on the Constructive Process Theory has 
been proposed. This paper describes the ben- 
efits of using this method in the Japanese-to- 
English machine translation system ALT-J/E. 

In comparison with conventional composi- 
tional methods, the Multi-Level Translation 
Method emphasizes the importance of the 
meaning contained in expression structures as a 
whole. It is shown to be capable of translating 
typical written Japanese based on the meaning 
of the text in its context, with comparative ease. 
We are now hopeful of carrying out useful ma- 
chine translation with no manual pre-editing. 

1 Introduction 

Recently, R&D efforts involving machine trans- 
lation of different language families, such as 
Japanese and English, have become popular 
(Tomabechi 1987; Tomita 1987; MT Summit-I 
1987). However, differences in perspective and 
how objects are thought of, in such different lan- 
guage families, affect how expressions are struc- 
tured. These differences in expression struc- 
ture make it difficult to convert from one lan- 
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guage to another mechanically. For example, 
in Japanese-to-English machine translation, the 
more typical the Japanese expression, the more 
difficult it is to translate into English, due to dif- 
ferences in the way that thought processes are 
expressed linguistically. 

As a means of solving this problem, efforts 
have been made in the area of sub-languages 
(Nagao 1985) or knowledge-based translation 
(Nirenburg 1989). But these methods currently 
require human intervention, that is, Japanese 
expressions must be rewritten into easily trans- 
latable Japanese. In other words, there is a need 
to re-write the text into a more English type of 
concept before machine translation can be per- 
formed. 

This action of re-writing is normally known 
as pre-editing (Nagao 1989). Pre-editing tech- 
niques include: use of a single word so as to 
have only one meaning; limiting the use of 'joshi' 
(Japanese post-positional words) and of auxil- 
iary verbs and other words likely to be inter- 
preted several ways; replacing, in advance, any 
words which may have been omitted; and the 
re- writing of idiomatic expressions to more gen- 
eral expressions. These all represent efforts to 
re-write the source into unambiguous Japanese 
which can be translated into English, literally. 

The theory and rationale of pre-editing in 
Japanese-to-English translation would appear 
to be closely related to the principle of elemen- 
tary compositionality. Elementary composition- 
ality hypothesizes that "the meaning of the en- 
tire expression is the sum of the meanings of 
the various portions of the expression" (Nomoto 
1986). This principle is taken as basic in exist- 
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ing machine translation systems and supports a 
most effective method between languages of the 
same family. When seeking high quality ma- 
chine translation, however, there still remain se- 
rious problems to be dealt with. 

Japanese-to-English machine translation has 
reached the stage where sentences that allow 
word-by-word transfer from Japanese to En- 
glish, followed by assembly into the final sen- 
tence form (i.e. where literal translation is pos- 
sible), can be translated by current technology. 
But there is a wide difference in the thought 
process constituting the background of linguis- 
tic expression between the Japanese and English 
languages. Therefore, translations using exist- 
ing systems require pre-editing to re-write the 
original Japanese sentences into a form that will 
enable application of the elementary composi- 
tional method, or in other words, a form that 
can undergo literal translation. 

To go beyond the limits of conventional trans- 
lation methods based on elementary compo- 
sitionality, we have proposed the Multi-Level 
Translation Method, (Ikehara et al. 1987; Ike- 
hara et al. 1989; Ikehara 1989) based on the 
Constructive Process Theory of Language (Tok- 
ieda 1941), and have made the experimental sys- 
tem ALT-J/E, the Automatic Language Trans- 
lator — Japanese to English. 

This method focuses attention on the fact 
that many expressions have meanings that can- 
not be deduced directly from the combination 
of the meanings of the individual words. It is 
a method of translation which grasps the struc- 
ture and meanings of expressions as a whole. 
The meanings of words will vary according to 
the manner and context in which the words are 
used. Many expressions have meanings that 
cannot be explained directly from the mean- 
ings of each individual word. With attention 
focused on these characteristics, those units hav- 
ing structural meanings have been arranged sys- 
tematically into a form of linguistic knowledge. 
This knowledge is being used in analysis of 
the Japanese language and conversion of the 
Japanese into English. As such, it represents 
a big step forward towards the fundamental so- 
lution of previously existing problems, hitherto 
only solvable by pre-editing. 



2 The Constructive Process 
Theory and the Multi- 
Level Translation Method 

2.1 The Constructive Process 
Theory of Language 

2.1.1 Problems of Conventional Transla- 
tion Systems 

The transfer method and the pivot method have 
been regarded as the methods most commonly 
used in machine translation (MT Summit-I 
1987). Whereas the pivot method hypothesizes 
an intermediate language common for both the 
original and the target language, the transfer 
system differs in that it uses an intermediate 
language for each language in order to convert 
meanings from one language to another. Both 
have in common the fact that they establish 
an intermediate language to represent meaning 
that is separate from the surface expression. 

It is possible to seek the background regarding 
these methods in the dualism of computational 
linguistics (Chomsky 1956; Chomsky 1965; Fill- 
more 1975) that discriminates between surface 
and deep structures. 

But the deep structure as suggested by com- 
putational linguistics cannot be said to have 
achieved success. In fact, concepts which deny 
the existence of deep structure have been sug- 
gested of late (Cresswell 1973; Mendelson 1979; 
Brcsnan 1982). 

Computational linguistics can be thought of 
as derived from computational logic (AUwood 
et al. 1971). It hypothesizes that the mean- 
ings of expressions do not rely on languages but 
are a form of common existence, and it also hy- 
pothesizes that the meaning of the expression in 
its entirety is the sum total of the meanings of 
sections of the expression. But these hypothe- 
ses arc only partially valid for actual languages. 
Thus, it would be difficult to apply this theory 
of computational linguistics to machine transla- 
tion which deals with actual text, particularly 
to translation involving a pair of languages with 
different origins such as Japanese and English. 

2.1.2 The Concept of the Constructive 
Process Theory of Language 

We believe that the key to solving this problem 
lies with the linguistic evolution theory of Tok- 
ieda Grammar (Tokieda 1941), one of the main 



2 



schools of traditional study of the Japanese lan- 
guage. Tokieda Grammar is derived from the 
theory of Norinaga Motoori (Motoori 1779) and 
it was developed from a critique of the linguis- 
tic theory propounded by Saussure (Saussure 
1909). It is regarded as one of the 4 major the- 
ories of grammar of Japan. 

According to the Constructive Process The- 
ory of Language, language is to be understood 
as a compound body of processes as in the field 
of physics, and can be viewed as the relation- 
ship between the 'object', '(speaker's) recogni- 
tion' and 'expression'. The relationship between 
'object' and 'recognition' can be explained by 
'Epistemology' or 'Reflection Theory', and be- 
tween 'recognition' and 'expression' by 'Linguis- 
tic Norm'. The sole element that is common 
between two diff'ering languages would be the 
'object' and since there are differences in how 
the 'object' is viewed and understood between 
languages, everything beyond 'recognition' will 
differ depending on the language in question. 
The very existence of 'deep structure' which is 
neither 'object' nor 'recognition' is denied alto- 
gether. 

Also, according to Tsutomu Miura (Miura 
1967) who built on the Constructive Process 
Theory, the meaning of linguistic expressions is 
the relationship between 'object', 'recognition' 
and 'expression'. This relationship is objectively 
connected to the 'expression' itself. The concept 
of regarding "relationship" as meaning resem- 
bles the recent work in situation semantics (Bar- 
wise and Perry 1981). But where situation se- 
mantics confuses "meanings of expression" with 
"meanings of the field where the expression is 
placed" , Miura Grammar draws a distinct line 
between the two and propounds a theory per- 
taining to "meanings of expression" . 

When language is regarded thus as a com- 
pound body of various processes, the following 
two points, placing importance on the meaning, 
are seen as important for machine translation. 

1. Expressions are classified^ into 'subjec- 
tive' which arc a direct expression of the 
emotions, intentions, and judgment of the 
speaker and 'objective' which express the 
object in the form of a concept, and repro- 
duce it within the framework of the target 
language. 

"'"Regarding the difference fjct"wecn sufjjcctive and ob- 
jective expressions, there is the theory of Port Royal 
(Lancelot and Arnauld 1972), before Norinaga Motoori. 



2. The structure, which involved with the ob- 
ject, is reflected by its recognition and this 
is further reflected in the structure of the 
expression. Therefore, the structure of an 
expression is to be considered as a part of 
its meaning, and the meaning is to be han- 
dled accordingly. 

2.2 The Multi-Level Translation 
Method 

ALT-J/E has implemented the Multi-Level 

Translation Method with due consideration of 
the foregoing two points. The translation pro- 
cess is outlined in Figure 1. First, the Japanese 
expression is analyzed and separated into sub- 
jective and objective parts. The subjective part 
(for example, tense and aspect) is translated 
separately from the objective part. Second, the 
objective part is translated in three stages (the 
Multi-Level Transfer Method). If there are any 
idiomatic expressions, these are translated first, 
in the Idiomatic Expression Transfer. Then 
any expressions whose predicates and arguments 
match an entry in the semantic pattern dic- 
tionary are translated as part of the Semantic 
Valency Transfer. Finally, any remaining ex- 
pressions are translated by the General Pattern 
Transfer. The entire process is designed to pre- 
vent loss of meaning through elementary decom- 
position. 

3 Organization of Linguistic 
Knowledge 

3.1 Semantic Categories of Words 

Nouns are used to express existing objects as 
concepts. Depending on how the object is 
viewed and understood, various profiles of the 
object are picked up or discarded. Which noun 
is to be used is selected based on a profile cor- 
responding to the view of the speaker. 

In conceiving the object, the special and indi- 
vidual characteristics are discarded and the fea- 
tures are recognized as a single unit. Among 
the concepts analyzing semantic features, there 
have been attempts to explain the meaning of 
nouns as a bundle of detailed meanings or fea- 
tures. But the concept that is represented by a 
noun is a single conclusive unit of recognition. 
It is, therefore, to be handled as an irreducible 
concept, that can only be captured as a whole. 
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Figure 1: The Multi-Level Translation Method 



We classify these concepts as semantic cate- 
gories. 

For example, the objective concept repre- 
sented by the word school would include "the 
school as an organization" and "the school as a 
given location" . In machine translation, there is 
a need to know which of these the word school 
signifies. In order to do this, thought was given 
to what type of profile is conceived for the object 
when it is used. These profiles were then classi- 
fied as semantic categories held by each noun. 

Around 3,000 categories were specified, about 
the number of important words which the nor- 
mal person feels comfortable in using. The se- 
mantic categories are ordered into two IS-A hi- 
erarchies. These are the common noun seman- 
tic categories, some 2,800 categories (12 levels 
deep) , and the proper noun semantic categories, 
some 200 categories (9 levels deep). Based on 
this ontology, a semantic word dictionary 
was compiled with 400,000 index words. The 
maximum number of semantic categories per 
word is 5 common noun categories and 10 proper 
noun categories. Overall, an average of 2 cate- 
gories are assigned to each noun in the dictio- 
nary. 

Projects using conceptual classifications sim- 
ilar to our semantic categorization, have pre- 
viously had around 30 to 50 categories. EDR 
(EDR 1990), is implementing plans to extend to 
500 categories. ALT-J/E is the first case of a 
system with a precision of some 3,000 categories 



and a large scale dictionary (around 400,000 in- 
dex words). 



3.2 The Meaning of Expression 
Structures as viewed from De- 
clinable Words 

In Japanese both verbs and adjectives are de- 
clinable. The basic structure of Japanese sen- 
tences revolves mainly around predicates. Look- 
ing at the declinable words, the meanings of 
the predicates themselves, and of their basic 
structure, can be understood by examining the 
types and meaning of nouns that fill the predi- 
cate's case frames. A semantic structure dic- 
tionary with some 6,000 index words (verbs 
and adjectives) consisting of 15,000 patterns has 
been prepared for use in analysis, transfer and 
generation. 

With this method, analysis is performed by 
having units of semantics and structure corre- 
spond to one another so that ambiguity in struc- 
tural analysis is reduced. Each Japanese entry 
has an English translation. As soon as the struc- 
ture of the Japanese is determined in the source 
language analysis, the basic English structure 
can be determined from the English form struc- 
ture in the semantic structure dictionary. This 
is helpful in avoiding the need for an additional 
conversion process. 
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4 Realization of New Func- 
tions 

Among the functions which have been reahzed 
through this method, the foUowing will solve 
problems previously requiring pre-editing. 

4.1 Precise Selection of Transla- 
tion According to Meaning 

Previously, re-writing of the original text was 
required so that each word in the source would 
have an unambiguous translation in the target 
language. But, due to the rich information in 
the semantic structure and word dictionaries, 
it has now become possible to differentiate into 
precise translations as shown in Figure 2. Man- 
ual rewriting is no longer necessary. 

It has also become possible to translate typ- 
ically Japanese expressions which were previ- 
ously difficult to translate into English as well as 
to differentiate between translation of idiomatic 
expressions and general expressions. 

Further, it has become clear, after experi- 
menting, that in order to translate the meanings 
of Japanese declinable words (verbs and adjec- 
tives) as shown in Figure 2 into English, it is 
necessary to have a description of detailed rules. 
It has been ascertained that this, in turn, re- 
quires a classification of detailed semantic cate- 
gories. A look at rules involving the 15,000 cases 
registered in the expression structure dictionary 
reveals the frequent use of semantic categories 
classified in the 8tli to 9th step in the seman- 
tic category system. This shows that at least 
the top nine levels of our ontology (about 2,000 
semantic categories) are needed to successfully 
disambiguate most predicates. 

4.2 Automatic Re-Writing Func- 
tion in Japanese 

There are many cases in which typical Japanese 
expressions, where two or more words are com- 
bined to form idiomatic expressions, cannot be 
literally translated and even if they were liter- 
ally translated, would be inappropriate in the 
English language. It would be advantageous to 
have such expressions automatically converted 
within the system into more easily translatable 
Japanese. But previous attempts to do this have 
foundered due to the problems of unwanted side 
effects. 



The Multi-Level Translation Method has en- 
abled a precise enumeration of conditions for the 
application of rules through detailed semantic 
categories. This has enabled side effects to be 
reduced and effectively re-writes the Japanese 
prior to translation. 

Figure 3 shows an example of a Japanese sen- 
tence, which normally has numerous predicates 
but which has been automatically rewritten so 
as to have fewer. Three Japanese verb phrases 
are changed into prepositional phrases in En- 
glish. 

4.3 Supplementation of el- 
lipsed elements through Con- 
text Processing 

The Japanese language normally omits elements 
that are easily recoverable from context, par- 
ticularly subjects and objects. But in English, 
these elements are in most cases obligatory. Pre- 
viously, supplementing these constituted an im- 
portant part of pre-editing. 

ALT-J/E has, in addition to the semantic 
structure dictionary and semantic categories, in- 
troduced an analysis of the semantic categories 
of predicates which allows the supplemention 
of ellipses using the semantic relations between 
sentences. 



4.4 Translation of Compound 
Words 

The Japanese language generates new words 
(compound words) which are an amalgamation 
of a number of nouns, prefixes and suffixes (a 
characteristic of agglutinative languages). This 
type of compound word is generated without 
limitation and so it is impossible to have them 
all registered in a dictionary in advance. With 
conventional translation methods, registration 
of these compound words in the dictionary was 
an important issue for pre-processing. 

ALT-J/E uses semantic categories to ana- 
lyze compound words to find the semantic rela- 
tionships of their constituents. This hmction 
makes the translation of unknown compound 
words possible. It also enables the automatic 
translation of compound words whose meanings 
vary depending on the manner in which they are 
used within a sentence. 
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Differentiating translation of the verb kakeru 'hang' 

kanojo-wa hana-ni mizu-o kaketa She poured water on a flower. 

haha-wa kamisama-ni gan-o kaketa A mother made a vow to God. 

watashi-wa karera-ni meiwaku-o kaketa I caused them trouble . 

kare-wa nikai-ni hashigo-o kaketa He placed a ladder up to the second floor. 

kensetsusho-wa koko-ni hashi-o kaketa The Ministry of Construction built a bridge here. 

kare-wa isu-ni koshi-o kaketeiru He is sitting down on a chair. 

karera-wa suna-o furui-ni kaketa They sift:cd sand. 

kanojo-wa mainichi roka-ni zokin-o kaketeiru . She mops up the corridor every day. 

kanojo-wa purezento-ni ribon-o kaketa She tied ribbon around a gift. 

kanojo-wa shokutaku-ni teburukurosu-o kaketa . She spread a tablecloth on a dining table. 

ano kissaten-wa rnodan-jazu-o kaketeiru That coffee shop is playing modern jazz. 

Differentiating translation of the noun mure 'group' 

6kami-no mure-ga hitsuji-no mure-o otta. ... A pack of wolves chased a flock of sheep, 

kujira-no mure-ga sakana^no mure-o otta. . . . A school of whales chased a shoal of flsh . 

ushi-no mure-ga hachi-no mure-ni osowareta. A herd of cattle was attacked by a swarm of bees , 

hito-no mure-ga boto-no mure-ni kawatta. . . A group of people changed to a mod of a mob . 



Fiinirc 2: Precise Selection of Words in Translation 



Original: kare-wa basu-ni notte gakko-e itta ga, watashi-wa kawa-ni sotte 

He bus ride school went but, 1 river go along 

aruite gakkd-c itta. 
walking school went. 

'He rode a bus and went to school, but 1 paralleled the river, walked and went 
to school.' 

Rewrite: kare-wa basu-de gakko-e itta ga, watashi-wa kawa-zoi-ni toho-de 
He by bus school went but, I river along on foot 

gakko-e itta. 
school went. 

'He went to school by bus , but I went to school on foot along the river .' 
Figure 3: Automatic Re- Writing in Japanese 
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5 The Benefits of the Multi- 
Level Translation Method 
and Future Issues 

5.1 Benefits of the Multi-Level 
Translation Method 

The experimental Japanese-to-English machine 
translation system ALT-J/E, based on the 
Multi-Level Translation Method, is currently 
being debugged. To examine the potential of 
this method, newspaper lead sentences (a sum- 
mary preceding the newspaper article proper, 
generally consisting of 3 to 5 sentences per arti- 
cle, and averaging 20 words per sentence) were 
translated in the following experiments. 

Blind Test: (BT) 

Experiments conducted with articles cho- 
sen at random with no registration of un- 
known words, nor rule revisions. 

Window Test: (WT) 

Experiments on a sample of text with revi- 
sion of the system allowed. Registration of 
unknown words and rule revisions are con- 
ducted during the test. 

(In both cases, the original text was trans- 
lated without any pre-editing) 

The standards used for the evaluation are an 

improved version of the ALPAC standards (Au- 
tomatic Language Processing Advisory Com- 
mittee 1966) with 10 points being a 'perfect' 
translation and grades 6 or higher being a pass 
(the sentence is understandable from reading 
the translation only). Grading was conducted 
by outside company specialists in translation. 
The average of grades as judged by three spe- 
cialists in Japanese-to-English translation were 
taken to determine passing or failing grades for 
each individual sentence. 

The condition for a passing grade was that the 
meaning could be understood by looking only at 
the translation. Thus, sentences that were ruled 
as passing are not guaranteed to be stylistically 
appropriate (or even grammatical) . But it is es- 
timated that a quality level equal to or better 
than that of existing Japanese-to-English ma- 
chine translation systems has been achieved. 

According to this test, the pass rates for the 
blind test were 40 to 50% , and over 60% for 



the window test. This indicates a passing ra- 
tio of about double that of existing Japanese-to- 
English machine translation systems. For tests 
pertaining to technical subjects (which are eas- 
ier to translate than the newspaper lead sen- 
tences), a pass rate of 80% was achieved. 

Based on the above results, we judge that 
with the Multi-Level Translation Method, a ma- 
jor step toward realization of a Japanese- to- 
English machine translation system requiring no 
pre-editing has been achieved. 

5.2 Future Issues 

The major problem currently being faced is the 
need for improvement of the translation quality 
of long sentences (of 30 words or longer) and 
for the overall improvement of the English in 
the translated text. To meet this challenge, re- 
search efforts are presently being focussed on an 
extended Japanese-to-English transfer method 
designed to analyze the meaning of the struc- 
ture of declinable words and to directly establish 
an appropriate English structure to correspond 
to this. This direct parse-tree transfer method 
will be adding a new path to the three trans- 
fer paths for objective expression in the Multi- 
Level Transfer Method, further improving and 
strengthening it. 

Over the long term, research efforts are be- 
ing extended to inc;lude a review of the system 
of parts of speech in the Japanese language and 
to extend the semantic hierarchy to multiple di- 
mensions. 

6 Summary 

This paper has presented the results of using 
the Multi-Level Translation Method, based on 
the Constructive Process Theory. It has shown 
that the method enables a Japanese-to-English 
machine translation system to function effec- 
tively without manual pre-editing. In fact, the 
major reasons for pre-editing the source text 
are no longer valid. But there remain prob- 
lems with translating typically long Japanese 
sentences and a need to improve the quality of 
finished translations. 

We call the limited use of semantic informa- 
tion used in the Multi-Level Translation Method 
meaning analysis. It is estimated that this 
level of technology is limited to a maximum 
success rate of approximately 80%. To attain 
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a higher level of accuracy it is essential to es- 
tablish an understanding of meaning in con- 
text, based on the expansion of general and spe- 
cialized knowledge of the target domains. We 
call this meaning comprehension. However, 
since it is difficult to establish the comprehen- 
sion of meaning in extremely broad or general 
fields, it is planned to establish the limits for 
processing based on meaning analysis first, and 
then follow up with research into the area of 
meaning comprehension. 
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